Input-Dependent Estimation of Generalization Error under Covariate Shift

نویسنده

  • Masashi Sugiyama
چکیده

A common assumption in supervised learning is that the training and test input points follow the same probability distribution. However, this assumption is not fulfilled, e.g., in interpolation, extrapolation, active learning, or classification with imbalanced data. The violation of this assumption—known as the covariate shift— causes a heavy bias in standard generalization error estimation schemes such as cross-validation or Akaike’s information criterion, and thus they result in poor model selection. In this paper, we propose an alternative estimator of the generalization error for the squared loss function when training and test distributions are different. The proposed generalization error estimator is shown to be exactly unbiased for finite samples if the learning target function is realizable and asymptotically unbiased in general. We also show that, in addition to the unbiasedness, the proposed generalization error estimator can accurately estimate the difference of the generalization error among different models, which is a desirable property in model selection. Numerical studies show that the proposed method compares favorably with existing model selection methods in regression for extrapolation and in classification with imbalanced data.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Model Selection Under Covariate Shift

A common assumption in supervised learning is that the training and test input points follow the same probability distribution. However, this assumption is not fulfilled, e.g., in interpolation, extrapolation, or active learning scenarios. The violation of this assumption— known as the covariate shift—causes a heavy bias in standard generalization error estimation schemes such as cross-validati...

متن کامل

Mixture Regression for Covariate Shift

In supervised learning there is a typical presumption that the training and test points are taken from the same distribution. In practice this assumption is commonly violated. The situations where the training and test data are from different distributions is called covariate shift. Recent work has examined techniques for dealing with covariate shift in terms of minimisation of generalisation e...

متن کامل

Addressing Covariate Shift in Active Learning with Adversarial Prediction

Active learning approaches used in practice are generally optimistic about their certainty with respect to data shift between labeled and unlabeled data. They assume that unknown datapoint labels follow the inductive biases of the active learner. As a result, the most useful datapoint labels— ones that refute current inductive biases—are rarely solicited. We propose an adversarial approach to a...

متن کامل

Generalization Bounds for Transfer Learning under Model Shift

Transfer learning (sometimes also referred to as domain-adaptation) algorithms are often used when one tries to apply a model learned from a fully labeled source domain, to an unlabeled target domain, that is similar but not identical to the source. Previous work on covariate shift focuses on matching the marginal distributions on observations X across domains while assuming the conditional dis...

متن کامل

A Conditional Expectation Approach to Model Selection and Active Learning under Covariate Shift

In the previous chapter, Kanamori and Shimodaira provided generalization error estimators which can be used for model selection and active learning. The accuracy of these estimators is theoretically guaranteed in terms of the expectation over realizations of training input-output samples. In practice, we are only given a single realization of training samples. Therefore, ideally, we want to hav...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005